# Overview of ARM Architecture & Cortex-M Processors

Lecture 3

Yeongpil Cho

Hanyang University

# **Topics**

- ARM Architecture
  - Overview
- Cortex-M Processors
  - Overview
  - Programmer's model

#### ARM Architecture Overview

# **ARM**

#### **ARM Powered Products**



39u10 The ARM Architecture

#### **ARM**

- ARM is best known for its range of RISC processor core designs
  - Other products fabric IP, software tools, models, cell libraries to help partners develop and ship ARM-based SoCs (System-on-Chip)
- ARM does not manufacture silicon
  - Licensed to partners to develop and fabricate new microcontrollers
    - Snapdragon series of Qualcommm
    - A series of Apple
    - Exynos series of Samsung
    - Tegra series of Nvidia

**–** ...

### ARM's IPs (Intellectual Properties) other than processors

#### **Graphic IP**

- ARM Mali-G71
- Mali-DP550 (Display processor)
- Mali-V550 (Video Processor)

#### Other IP

- ARM CoreLink CCI-550 (Cache Coherent Interconnect) • ARM Compiler 6
- CoreLink GIC-500 (Interrupt Controller)
- CoreLink MMU-500 (System Memory Management Unit)
- CoreLink TZC-400 (ARM TrustZone® Controller)
- CoreLink DMC-500/DMC-520 (Dynamic Memory Controller)
- ARM CoreSight M SoC-400 (Debug and Trace)

#### Tools

- ARM Development Studio
- Fixed Virtual Platforms
- ARM Fast Models
- ARM Versatile Express

#### ARM Architecture

- Based upon RISC (Reduced Instruction Set) Architecture with enhancements to meet requirements of embedded applications
  - Fixed length instructions
  - Load-store architecture
    - Memory-to-register load instructions
    - Register-to-memory store instructions
  - A large uniform register file
  - 32-bit architecture (v1-v7), 64-bit architecture (v8-v9)
  - Good speed/power

# **ARM Processor Family**



#### Release dates of ARM cores



#### **ARM Cortex Processors**

- Cortex-A series (Application)
  - High performance processors capable of full Operating System
     (OS) support
  - Applications include smartphones, digital TV, smart books
- Cortex-R series (Real-time)
  - High performance and reliability for real-time applications
  - Applications include automotive braking system, powertrains
- Cortex-M series (Microcontroller)
  - Cost effectiveness and reliability for real-time applications
  - Applications include tiny IoT devices, smart sensors

# **Summary of Processor Characteristics**

|          | Application processors     | Real-time processors       | Microcontroller          |  |  |
|----------|----------------------------|----------------------------|--------------------------|--|--|
|          |                            |                            | processors               |  |  |
| Design   | High clock frequency,      | High clock frequency,      | Short pipeline,          |  |  |
|          | Long pipeline,             | Long to medium             | ultra low power,         |  |  |
|          | High performance,          | pipeline length,           | Deterministic (low       |  |  |
|          | Multimedia support (NEON   | Deterministic (low         | interrupt latency)       |  |  |
|          | instruction set extension) | interrupt latency)         |                          |  |  |
| System   | Memory Management Unit     | Memory Protection          | Memory Protection Unit   |  |  |
| features | (MMU), virtualizing memor  | $\gamma$ Unit (MPU), cache | (MPU), Nested Vectored   |  |  |
|          | cache memory,              | memory, Tightly            | Interrupt Controller     |  |  |
|          | ARM TrustZone® security    | <b>Coupled Memory</b>      | (NVIC), Wakeup Interrupt |  |  |
|          | extension                  | (TCM)                      | Controller (WIC)         |  |  |
| Targeted | Mobile computing, smart    | Industrial                 | Microcontrollers,        |  |  |
| markets  | phones,                    | microcontrollers,          | Deeply embedded          |  |  |
|          | energy-efficient servers,  | automotives,               | systems (e.g. sensors,   |  |  |
|          | high-end microprocessors   | Hard disk controllers,     | MEMS, mixed signal IC),  |  |  |
|          |                            | Baseband modem             | Internet of Things (IoT) |  |  |

11

#### Relative Performance\*





<sup>\*</sup>Represents attainable speeds in 130, 90, 65, or 45nm processes

## ARM processors vs. ARM architectures

#### ARM architecture

- Describes the details of instruction set, programmer's model, exception model, and memory map
- Documented in the Architecture Reference Manual

#### ARM processor

- Developed using one of the Arm architectures
- More implementation details, such as timing information
- Documented in processor's Technical Reference Manual



## Functional Evolutions by Architectures



Remarks: See on the next slide.

#### **Instruction Sets**



#### Instruction State Switches

- The current instruction state (between ARM and Thumb/Thumb-2) is determined depending on the LSB (Least Significant bit) of the branch address. (branches: function calls, jumps, ...)
  - LSB =  $0 \rightarrow ARM$  instruction set
  - LSB =  $1 \rightarrow$  Thumb/Thumb-2 instruction set



#### ARM and Thumb Performance



Memory width (zero wait state)

#### The Thumb-2 instruction set

- Variable-length instructions
  - ARM instructions are a fixed length of 32 bits
  - Thumb instructions are a fixed length of 16 bits
  - Thumb-2 instructions can be either 16-bit or 32-bit
- Thumb-2 gives approximately 25% improvement in performance over Thumb
- Thumb-2 gives approximately 26% reduction in code size over ARM



#### Cortex-M Processor Overview

# Cortex-M Processor Family

| Processor    | Descriptions                                                                                                                            |
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| Cortex-M0    | <b>A very small processor</b> (starting from 12K gates) for low cost, ultra low power microcontrollers and deeply embedded applications |
|              | The most energy-efficient processor for small embedded system. Similar size and programmer's                                            |
| Cortex-M0+   | model to the Cortex-M0 processor, but with additional features like single cycle I/O interface and                                      |
|              | vector table relocations                                                                                                                |
| Cortex-M1    | A small processor design optimized for FPGA designs and provides Tightly Coupled Memory                                                 |
| Cortex-M1    | (TCM) implementation using memory blocks on the FPGAs. Same instruction set as the Cortex-M0                                            |
|              | A small but <b>powerful embedded processor for low-power microcontrollers</b> that has a rich                                           |
| Cortex-M3    | instruction set to enable it to handle complex tasks quicker. It has a hardware divider and Multiply-                                   |
| GOI LEX 1VIS | Accumulate (MAC) instructions. In addition, it also has comprehensive debug and trace features to                                       |
|              | enable software developers to develop their applications quicker                                                                        |
|              | It provides all the features on the Cortex-M3, with <b>additional instructions</b> target at Digital Signal                             |
| Cortex-M4    | Processing (DSP) tasks, such as Single Instruction Multiple Data (SIMD) and faster single cycle                                         |
|              | MAC operations. In addition, it also have an optional single precision floating point unit that                                         |
|              | support IEEE 754 floating point standard                                                                                                |
|              | <b>High-performance processor</b> for high-end microcontrollers and processing intensive                                                |
| Cortex-M7    | applications. It has all the ISA features available in Cortex-M4, with additional support for double-                                   |
|              | precision floating point, as well as additional memory features like cache and Tightly Coupled Memory (TCM)                             |
|              | A small processor for ultra-low power and low cost designs, similar to the <b>Cortex-M0+</b> processor,                                 |
| Cortex-M23   | but with various enhancements in instruction set and system-level features. It also supports the                                        |
| Gortex 1425  | TrustZone security extension.                                                                                                           |
|              | A mainstream processor design, similar to previous Cortex-M3 and Cortex-M4 processors, but                                              |
| Cortex-M33   | with much better flexibility in system design, better energy efficiency and higher performance. It                                      |
|              | also supports the <b>TrustZone security extension</b> .                                                                                 |
|              |                                                                                                                                         |

# ARM Cortex-M series family

Digital Gignal

| Processor      | Arm<br>Architecture | Core<br>Architecture | Thumb® | Thumb®-2 | Hardware<br>Multiply | Hardware<br>Divide | Saturated<br>Math | DSP<br>Extensions | Floating<br>Point |
|----------------|---------------------|----------------------|--------|----------|----------------------|--------------------|-------------------|-------------------|-------------------|
| Cortex-M0      | Armv6-M             | Von Neumann          | Most   | Subset   | 1 or 32 cycle        | No                 | No                | No                | No                |
| Cortex-M0+     | Armv6-M             | Von Neumann          | Most   | Subset   | 1 or 32 cycle        | No                 | No                | No                | No                |
| Cortex-M3      | Armv7-M             | Harvard              | Entire | Entire   | 1 cycle              | Yes                | Yes               | No                | No                |
| Cortex-M4      | Armv7E-M            | Harvard              | Entire | Entire   | 1 cycle              | Yes                | Yes               | Yes               | Optional          |
| Cortex-M7      | Armv7E-M            | Harvard              | Entire | Entire   | 1 cycle              | Yes                | Yes               | Yes               | Optional          |
| Cortex-M23, 33 | Armv8-M             | Harvard              | Entire | Entire   | 1 cycle              | Yes                | Yes               | Yes               | Optional          |

# Instruction Set support in the Cortex-M processor



Floating Point

DSP (SIMD, fast MAC)

Advanced data processing bit field manipulations

General data processing I/O control tasks

# Instruction Set support in the Cortex-M processor



# Cortex-M3 Processor(1/2)

- Hierarchical processor integrating core and advanced system peripherals
- Cortex-M3 Processor (By chip manufacturers)
  - Cortex-M3 core
  - Configurable interrupt controller
    - NVIC: Nested Vectored Interrupt Controller
  - Bus matrix (ICode, Dcode, System Bus)
  - Advanced debug components(ETM...)
  - Optional MPU
- Cortex-M3 core (By ARM)
  - Harvard architecture
  - 3-stage pipeline prediction
  - Thumb®-2
  - ALU w. H/W divide and single cycle multiply



# Cortex-M3 Processor(2/2)



<sup>\*</sup> Preliminary gate counts & power consumption based on initial implementation Gate Counts are based on TSMC 0.18 at 50MHz Optional ETM & MPU gate counts not included

# Programmer's Model

# Cortex-M Programmer's Model

- Fully programmable in C
  - Assembly is not mandatory.
- Stack-based exception model
  - All corruptible registers automatically pushed onto stack upon exceptions in hardware
- Thumb & Thumb-2 instructions sets
- Only two processor modes
  - Thread Mode for User tasks
  - Handler Mode for exceptions
- Vector table contains addresses of exception handlers

# Registers of Cortex-M Processors



#### Stack Pointers

R13 is the stack pointer.



- Two stack pointers are banked so that only one is visible at a time.
- The two stack pointers are:
  - Main Stack Pointer (MSP): This is the default stack pointer, used by the OS kernel, exception handlers, and privileged-mode programs
  - Process Stack Pointer (PSP): Used by the user application code

## Program Counter

- R15 is the program counter (PC).
- When you read this register you will find that the value is different than the location of the executing instruction, due to the pipelining of the processor
  - Example:

```
0x1000: MOV RO, PC ; R0 = 0x1004
```

- When writing to the PC, it will cause a branch.
  - the LSB must be set to 1 to indicate the Thumb state operations (setting to 0 implies to switch to the ARM state, which will result in a fault exception in Cortex-M3)
- When reading the PC, the LSB (bit 0) is always 0.
  - Instructions are always aligned in 16-bit or 32-bit

# Link Register

- •R14 is the link register (LR).
- LR is used to store the return program counter when a subroutine or function is called.
- e.g., when using the BL (Branch with Link) instruction:

```
main: ; Main program
...
BL foo ; Call foo using Branch with Link instruction:
; PC = foo
; LR = the next instruction in main
...
foo:
... ; Program code for foo
BX LR ; Return
```

# Special Registers

- The special registers in the Cortex-M3 processor include:
  - Program Status Registers (PSRs)
  - Interrupt Mask Registers (PRIMASK, FAULTMASK, and BASEPRI)
  - Control Register (CONTROL)
- Can only be accessed via MSR and MRS instructions

```
E.g.,MRS < reg >, < special_reg > ; Read special registerMSR < special_reg >, < reg > ; write to special register
```

 Note: MSR and MRS cannot have memory addresses, only registers are allowed

# Program Status Registers (PSRs)

- The program status registers are subdivided into three status registers:
- Application/Interrupt/Execution PSR (A/I/EPSR)

|      | 31               | 30 | 29 | 28 | 27 | 26:25  | 24 | 23:20 | 19:16 | 15:10  | 9 | 8 | 7 | 6 | 5 | 4:0 |
|------|------------------|----|----|----|----|--------|----|-------|-------|--------|---|---|---|---|---|-----|
| APSR | Ν                | Z  | С  | ٧  | Q  |        |    |       |       |        |   |   |   |   |   |     |
| IPSR | Exception Number |    |    |    |    |        |    |       |       |        |   |   |   |   |   |     |
| EPSR |                  |    |    |    |    | ICI/IT | Т  |       |       | ICI/IT |   |   |   |   |   |     |

 When they are accessed as a collective item, the name xPSR is used (PSR used in program codes).

|      | 31 | 30 | 29 | 28 | 27 | 26:25  | 24 | 23:20 | 19:16 | 15:10  | 9 | 8   | 7     | 6    | 5    | 4:0 |
|------|----|----|----|----|----|--------|----|-------|-------|--------|---|-----|-------|------|------|-----|
| xPSR | N  | Z  | O  | ٧  | Q  | ICI/IT | Т  |       |       | ICI/IT |   | Exc | eptic | n Nu | ımbe | r   |

# Program Status Registers (PSRs)

EPSR and IPSR are read-only:

```
    MRS r0, APSR ; Read Flag state into R0
    MRS r0, IPSR ; Read Exception/Interrupt state
    MRS r0, EPSR ; Read Execution state
    MSR APSR, r0 ; Write Flag state
```

#### Accessing xPSR:

```
MRS r0, PSR ; Read the combined program status word
MSR PSR, r0 ; Write combined program state word
```

# Program Status Registers (PSRs)

| Bit              | Description                                                                  |
|------------------|------------------------------------------------------------------------------|
| N                | Negative                                                                     |
| Z                | Zero                                                                         |
| С                | Carry/borrow                                                                 |
| V                | Overflow                                                                     |
| Q                | Sticky saturation flag                                                       |
| ICI/IT           | Interrupt-Continuable Instruction (ICI) bits IF-THEN instruction status bit  |
| Т                | Thumb state, always 1; trying to clear this bit will cause a fault exception |
| Exception Number | Indicates which exception the processor is handling                          |

cument exception

**Bit Fields in Cortex-M3 Program Status Registers** 

### PRIMASK, FAULTMASK and BASEPRI Registers

- These registers are used to disable exceptions.
  - NMI: Non-Maskable Interrupt

| Register Name | Description                                                                                                                                                                      |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PRIMASK       | A 1-bit register. When this is set, it allows NMI and the hard fault exception; all other interrupts and exceptions are masked; default is 0 (no masking)                        |
| FAULTMASK     | A 1-bit register. When this is set, it allows only the NMI, and all interrupts and fault handling exceptions are disabled; default is 0                                          |
| BASEPRI       | A register of up to 9 bits. It defines the masking priority level. When this is set, it disables all interrupts of the same or lower level (larger priority value); default is 0 |

**Cortex-M3 Interrupt Mask Registers** 

### PRIMASK, FAULTMASK and BASEPRI Registers

- To access the PRIMASK, FAULTMASK, and BASEPRI registers, the MRS and MSR instructions are used.
- Example:

```
MRS r0, BASEPRI ; Read BASEPRI register into R0
MRS r0, PRIMASK ; Read PRIMASK register into R0
MRS r0, FAULTMASK ; Read FAULTMASK register into R0
MSR BASEPRI, r0 ; Write R0 into BASEPRI register
MSR PRIMASK, r0 ; Write R0 into FAULTMASK register

MSR FAULTMASK, r0 ; Write R0 into FAULTMASK register
```

- PRIMASK and BASEPRI are useful for temporarily disabling interrupts in timing-critical tasks
- FAULTMASK is used by the OS kernel which cleans up a crashed task
- The PRIMASK, FAULTMASK, and BASEPRI registers cannot be set in the user access level.

# The Control Register

• The Control register is used to define the privilege level and the stack pointer selection. This register has two bits.

| Bit        | Function                                                                                          |
|------------|---------------------------------------------------------------------------------------------------|
| CONTROL[1] | Stack status:                                                                                     |
|            | 1 = Alternate stack is used                                                                       |
|            | 0 = Default stack (MSP) is used                                                                   |
|            | If it is in the Thread or base level, the alternate stack is the PSP. There is no alternate stack |
|            | for handler mode, so this bit must be zero when the processor is in handler mode.                 |
| CONTROL[0] | 0 = Privileged in Thread mode                                                                     |
|            | 1 = User state in Thread mode                                                                     |
|            | If in handler mode (not Thread mode), the processor operates in privileged mode.                  |

## The Control Register

#### CONTROL[1]

- In Cortex-M3, the CONTROL[1] bit is always 0 (MSP) in handler mode.
- However, in the Thread mode, it can be either 0 or 1.
- This bit is writable only when the core is in Thread mode and privileged.

#### • CONTROL[0]

The CONTRL[0] bit is writable only in privileged level.

To access the Control register, the MRS and MSR instructions are used:

```
MRS r0, CONTROL ; Read CONTROL register into R0 MSR CONTROL, r0 ; Write R0 into CONTROL register
```

# Operation Mode

- Two modes and two privilege levels.
- The operation modes determine whether the processor is running a normal program or running an exception handler.

|                           | Privileged Level | Unprivileged(User) Level |
|---------------------------|------------------|--------------------------|
| When running an exception | Handler Mode     |                          |
| When running main program | Thread Mode      | Thread Mode              |

**Operation Modes and Privilege Levels in Cortex-M3** 

#### **Processor Mode**

#### Handler Mode

- Used to handle exceptions.
- The processor returns to Thread mode when it has finished exception processing.
- In Handler mode, software execution is always privileged.

#### Thread Mode

- Used to execute application software.
- The processor enters Thread mode when it comes out of reset.
- In Thread mode, the CONTROL register controls whether software execution is privileged or unprivileged, see CONTROL register.

## Privilege Levels

#### Unprivileged (User)

#### The software:

- has limited access to the MSR (Move Register to Special Register) and MRS (Move Special Register to Register) instructions, and cannot use the CPS (Change Processor State) instruction
- cannot access the system timer, NVIC, or system control block
- might have restricted access to memory or peripherals.
- Unprivileged software executes at the unprivileged level

#### Privileged

 The software can use all the instructions and has access to all resources.

# Transition of Operation Mode



#### **Vector Tables**

- The vector table is an array of word data, with each representing the starting address of the handler for one exception/interrupt type.
  - In the Cortex-M3, vector addresses in the vector table should have their LSB set to 1 to indicate that they are Thumb code.
- The base address of the vector table is relocatable (set the relocation register in the NVIC); initially, the base address is 0x0.
- Example:
  - The reset is exception type 1. The address of the reset vector is 1 times 4, which equals 0x00000004; and NMI vector (type 2) is located in 2 \* 4 = 0x00000008
  - The word stored at address 0x00000000 is used as the starting value for the MSP.

#### **Vector Tables**

| _ |
|---|
|   |
| • |
| • |
| • |
| • |
|   |
| • |
|   |
|   |
| • |
|   |
| • |
| - |
|   |

| Address Offset | Exception Type | Exception Vector          |
|----------------|----------------|---------------------------|
| 0x3C           | 15             | SYSTICK                   |
| 0x38           | 14             | PendSV                    |
| 0x34           | 13             | Reserved                  |
| 0x30           | 12             | Debug Monitor             |
| 0x2C           | 11             | SVC                       |
| 0x1C-0x28      | 7-10           | Reserved                  |
| 0x18           | 6              | Usage fault               |
| 0x14           | 5              | Bus fault                 |
| 0x10           | 4              | MemManage fault           |
| 0x0C           | 3              | Hard fault                |
| 0x08           | 2              | NMI                       |
| 0x04           | 1              | Reset                     |
| 0x00           | 0              | Starting value of the MSP |

**Vector Table Definition After Reset** 

debut Stack of pointer

# Reset Sequence

- After the processor exits reset, it will read two words from memory:
  - Address 0x00000000: default value of R13 (MSP)
  - Address 0x00000004: Reset vector (the starting address of startup program; LSB should be set to 1 to indicate Thumb state)



Time

# Reset Sequence



Initial Stack Pointer Value and Initial Program Counter (PC) Value Example